| model | pass1 | win_rate | elo | |
|---|---|---|---|---|
| 0 | gpt-4-turbo-2024-04-09+cot | 0.820 | 0.928 | 1544.180 |
| 1 | gpt-4-0613+cot | 0.771 | 0.920 | 1519.130 |
| 2 | claude-3-opus-20240229+cot | 0.820 | 0.862 | 1404.876 |
| 3 | gpt-3.5-turbo-0613+cot | 0.590 | 0.801 | 1323.343 |
| 4 | gpt-4-0613 | 0.687 | 0.735 | 1235.863 |
| 5 | codellama-34b+cot | 0.436 | 0.723 | 1225.902 |
| 6 | gpt-4-turbo-2024-04-09 | 0.677 | 0.715 | 1215.286 |
| 7 | claude-3-opus-20240229 | 0.657 | 0.665 | 1163.230 |
| 8 | codellama-13b+cot | 0.360 | 0.656 | 1162.080 |
| 9 | codellama-7b+cot | 0.299 | 0.556 | 1070.951 |
| 10 | deepseek-base-33b | 0.486 | 0.501 | 1019.993 |
| 11 | deepseek-instruct-33b | 0.499 | 0.501 | 1019.080 |
| 12 | gpt-3.5-turbo-0613 | 0.494 | 0.475 | 1000.000 |
| 13 | codetulu-2-34b | 0.458 | 0.463 | 987.470 |
| 14 | deepseek-base-6.7b | 0.435 | 0.446 | 976.538 |
| 15 | magicoder-ds-7b | 0.444 | 0.429 | 960.910 |
| 16 | codellama-34b | 0.424 | 0.410 | 945.742 |
| 17 | mixtral-8x7b | 0.405 | 0.409 | 946.291 |
| 18 | codellama-13b | 0.397 | 0.381 | 927.468 |
| 19 | wizard-34b | 0.434 | 0.360 | 906.376 |
| 20 | wizard-13b | 0.413 | 0.357 | 904.437 |
| 21 | codellama-python-34b | 0.414 | 0.348 | 897.732 |
| 22 | codellama-python-13b | 0.398 | 0.343 | 893.673 |
| 23 | deepseek-instruct-6.7b | 0.412 | 0.318 | 873.682 |
| 24 | phind | 0.397 | 0.306 | 864.398 |
| 25 | phi-2 | 0.335 | 0.292 | 849.806 |
| 26 | codellama-python-7b | 0.359 | 0.290 | 848.583 |
| 27 | mistral-7b | 0.343 | 0.273 | 833.922 |
| 28 | codellama-7b | 0.342 | 0.270 | 833.731 |
| 29 | starcoderbase-16b | 0.342 | 0.268 | 828.742 |